Not able to download a CSV file from a website - python

Maybe someone knows what's a problem with downloading from the site below... I run this code in Jupiter, and nothing happens.
import requests
import os
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)

I've also tried this code and it works fine on my side, assuming folder and file were defined correctly. Alternatively, you can try using pandas which can read a CSV file from URL. So the code would become:
import pandas as pd
import csv
url = '{Some CSV target}'
df = pd.read_csv(url)
df.to_csv('{absoulte path to CSV}', sep=',', index=False, quoting=csv.QUOTE_ALL)

Its has started working only after declaring proxy.
Im practicing on my work laptop. Maybe the local net is blocking my requests.
I hope it can be helpful for someone.
Thanks to everyone for your help!
import requests
import os
os.environ['HTTP_PROXY'] = 'your proxy'
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("C://DownloadLocation", "file.csv"), 'wb') as f:
f.write(response.content)

Related

Getting question marks using Python requests with Excel file

I'm new to Python3 and requests. I found a Dataset on Harvard Dataverse but I've been stuck for hours trying to extract the Dataset. Instead I get question marks in my content and no readable data. I found similar issues but I'm still unable to solve mine.
Can anyone help me please ?
It would be so much appreciated ;)
Many thanks !!
import requests
import pandas as pd
import csv
import sys
#print(sys.executable)
#print(sys.version)
#print(sys.version_info)
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
r = requests.get(url)
print(type(r))
print('*************')
print('Response Code:', r.status_code)
print('*************')
print('Response Headers:\n', r.headers)
print('*************')
print('Response Content:\n',r.text)
print(r.encoding)
print(r.content)
with open('myfile.csv', mode='w', newline='') as f:
writer = csv.writer(f)
writer.writerows(r.text)
df = pd.read_csv('myfile.csv')
data = pd.DataFrame(df)
print("The content of the file is:\n", data)
print(data.head(10))
It seems that the request URL is not giving valid json response instead it is returning the whole excel file that contains the dataset which you want.
Instead of directly accessing the response object you should first save the response in excel file 'dataset.xlsx' then try to access that excel file in order to get results which you want.
The following code will help you to save the response in excel file. Then you can use xlrd https://www.geeksforgeeks.org/reading-excel-file-using-python/ python library to extract data from the file.
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
resp = requests.get(url)
open('dataset.xlsx', 'wb').write(resp.content)

Download files from a website using python

I am new to Python and I have a requirement to download multiple csv-files from a website authenticated using username and password.
I wrote the below piece of code to download a single file but unfortunately the contents in the downloaded file are not same as in the original file.
Could you please let me know what I am doing wrong here and how to achieve this.
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="https:xxxxxxxxxxxxxxxxxxxx.aspx/20-02-2019 124316CampaignExport.csv"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("D:/20-02-2019 124316CampaignExport.csv", 'wb') as f:
shutil.copyfileobj(r.raw, f)
The following code worked for me (only indenting the last line):
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="linkToDownload"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("filename", 'wb') as f:
shutil.copyfileobj(r.raw, f)
This means the problem is stemming from your URL or authentication rather than the python code itself.
Your URL has a space in it, which is likely causing an error. I can't confirm for sure as I don't have your URL. If you have write-access to it, try renaming it with a "_" insetead of a space.

Run URL to download a file with python

I'm working on a program that downloads data from a series of URLs, like this:
https://server/api/getsensordetails.xmlid=sesnsorID&username=user&password=password
the program goes through a list with IDs (about 2500) and running the URL, try to do it using the following code
import webbrowser
webbrowser.open(url)
but this code implies to open the URL in the browser and confirm if I want to download, I need him to simply download the files without opening a browser and much less without having to confirm
thanks for everything
You can use the Requests library.
import requests
print('Beginning file download with requests')
url = 'http://PathToFile.jpg'
r = requests.get(url)
with open('pathOfFileToReceiveDownload.jpg', 'wb') as f:
f.write(r.content)

How to open .csv file from a url with Python?

I'm trying to open a csv file from a url but for some reason I get an error saying that there is an invalid mode or filename. I'm not sure what the issue is. Help?
url = "http://...."
data = open(url, "r")
read = csv.DictReader(data)
Download the stream, then process:
import urllib2
url = "http://httpbin.org/get"
response = urllib2.urlopen(url)
data = response.read()
read = csv.DictReader(data)
I recommend pandas for this:
import pandas as pd
read = pandas.io.parsers.read_csv("http://....", ...)
please see the documentation.
You can do the following :
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
Slightly tongue in cheek:
require json
>>> for line in file(','):
... print json.loads('['+line+']')
CSV is not a well defined format. JSON is so this will parse a certain type of CSV correctly every time.

Downloading file from imgur using python directly via url

Sometime, links to imgur are not given with the file extension. For example: http://imgur.com/rqCqA. I want to download the file and give it a known name or get it name inside a larger code. The problem is that I don't know the file type, so I don't know what extension to give it.
How can I achieve this in python or bash?
You should use the Imgur JSON API. Here's an example in Python, using requests:
import posixpath
import urllib.parse
import requests
url = "http://api.imgur.com/2/image/rqCqA.json"
r = requests.get(url)
img_url = r.json["image"]["links"]["original"]
fn = posixpath.basename(urllib.parse.urlsplit(img_url).path)
r = requests.get(img_url)
with open(fn, "wb") as f:
f.write(r.content)
I just tried going to the following URLs:
http://imgur.com/rqCqA.jpg
http://imgur.com/rqCqA.png
http://imgur.com/rqCqA.gif
And they all worked. It seems that Imgur stores several types of the same image - you can take your pick.
I've used this before to download tons of xkcd webcomics and it seems to work for this as well.
def saveImage(url, fpath):
contents = urllib2.urlopen(url)
f = open(fpath, 'w')
f.write(contents.read())
f.close()
Hope this helps
You can parse the source of the page using BeautifulSoup or similar and look for img tags with the photo hash in the src. With your example, the pic is
<img alt="" src="http://i.imgur.com/rqCqA.jpg" original-title="">

Categories