Download csv file using python 3

Download csv file using python 3 - python

I am new to Python. Here is my environment setup:
I have Anaconda 3 ( Python 3). I would like to be able to download an CSV file from the website:
https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD
I would like to use the requests library. I would appreciate anyhelp in figuring our how I can use the requests library in downloading the CSV file to the local directory on my machine

It is recommended to download data as stream, and flush it into the target or intermediate local file.
import requests
def download_file(url, output_file, compressed=True):
"""
compressed: enable response compression support
"""
# NOTE the stream=True parameter. It enable a more optimized and buffer support for data loading.
headers = {}
if compressed:
headers["Accept-Encoding"] = "gzip"
r = requests.get(url, headers=headers, stream=True)
with open(output_file, 'wb') as f: #open as block write.
for chunk in r.iter_content(chunk_size=4096):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush() #Afterall, force data flush into output file (optional)
return output_file
Considering original post:
remote_csv = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
local_output_file = "test.csv"
download_file(remote_csv, local_output_file)
#Check file content, just for test purposes:
print(open(local_output_file).read())
Base code was extracted from this post: https://stackoverflow.com/a/16696317/176765
Here, you can have more detailed information about body stream usage with requests lib:
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow

Related

How to save image which sent via flask send_file

I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?

Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.

You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)

Immediate write JSON API response to file with Python Requests

I am trying to retrieve data from an API and immediate write the JSON response directly to a file and not store any part of the response in memory. The reason for this requirement is because I'm executing this script on a AWS Linux EC2 that only has 2GB of memory, and if I try to hold everything in memory and then write the responses to a file, the process will fail due to not enough memory.
I've tried using f.write() as well as sys.stdout.write(), but both of these approaches seemed to only write the file after all the queries were executed. While this worked with my small example, it didn't work when dealing with my actual data.
The problem with both approaches below is that the file doesn't populate until the loop is complete. This will not work with my actual process, as the machine doesn't have enough memory to hold the all the responses in memory.
How can I adapt either of the approaches below, or come up with something new, to write data received from the API immediately to a file without saving anything in memory?
Note: I'm using Python 3.7 but happy to update if there is something that would make this easier.
My Approach 1
# script1.py
import requests
import json
with open('data.json', 'w') as f:
for i in range(0, 100):
r = requests.get("https://httpbin.org/uuid")
data = r.json()
f.write(json.dumps(data) + "\n")
f.close()
My Approach 2
# script2.py
import request
import json
import sys
for i in range(0, 100):
r = requests.get("https://httpbin.org/uuid")
data = r.json()
sys.stdout.write(json.dumps(data))
sys.stdout.write("\n")
With approach 2, I tried using the > to redirect the output to a file:
script2.py > data.json

You can use response.iter_content to download the content in chunks. For example:
import requests
url = 'https://httpbin.org/uuid'
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open('data.json', 'wb') as f_out:
for chunk in r.iter_content(chunk_size=8192):
f_out.write(chunk)
Saves data.json with content:
{
"uuid": "991a5843-35ca-47b3-81d3-258a6d4ce582"
}

Python getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 10: invalid start byte" [duplicate]

If I have a URL that, when submitted in a web browser, pops up a dialog box to save a zip file, how would I go about catching and downloading this zip file in Python?

As far as I can tell, the proper way to do this is:
import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
of course you'd want to check that the GET was successful with r.ok.
For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.
If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.
import urllib.request
def download_url(url, save_path):
with urllib.request.urlopen(url) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
Finally, if you are using Python 2 still, you can use urllib2.urlopen.
from contextlib import closing
def download_url(url, save_path):
with closing(urllib2.urlopen(url)) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())

With the help of this blog post, I've got it working with just requests.
The point of the weird stream thing is so we don't need to call content
on large requests, which would require it to all be processed at once,
clogging the memory. The stream avoids this by iterating through the data
one chunk at a time.
url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'
response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)

Here's what I got to work in Python 3:
import zipfile, urllib.request, shutil
url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
with zipfile.ZipFile(file_name) as zf:
zf.extractall()

Super lightweight solution to save a .zip file to a location on disk (using Python 3.9):
import requests
url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'
r = requests.get(url)
with open(output, 'wb') as f:
f.write(r.content)

Either use urllib2.urlopen, or you could try using the excellent Requests module and avoid urllib2 headaches:
import requests
results = requests.get('url')
#pass results.content onto secondary processing...

I came here searching how to save a .bzip2 file. Let me paste the code for others who might come looking for this.
url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"
response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
f.write(response.content)
I just wanted to save the file as is.

Thanks to #yoavram for the above solution,
my url path linked to a zipped folder, and encounter an error of BADZipfile
(file is not a zip file), and it was strange if I tried several times it
retrieve the url and unzipped it all of sudden so I amend the solution a little
bit. using the is_zipfile method as per here
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

Use requests, zipfile and io python packages.
Specially BytesIO function is used to keep the unzipped file in memory rather than saving it into the drive.
import requests
from zipfile import ZipFile
from io import BytesIO
r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
print(f.read())

Python: Download CSV file, check return code?

I am downloading multiple CSV files from a website using Python. I would like to be able to check the response code on each request.
I know how to download the file using wget, but not how to check the response code:
os.system('wget http://example.com/test.csv')
I've seen a lot of people suggesting using requests, but I'm not sure that's quite right for my use case of saving CSV files.
r = request.get('http://example.com/test.csv')
r.status_code # 200
# Pipe response into a CSV file... hm, seems messy?
What's the neatest way to do this?

You can use the stream argument - along with iter_content() it's possible to stream the response contents right into a file (docs):
import requests
r = None
try:
r = requests.get('http://example.com/test.csv', stream=True)
with open('test.csv', 'w') as f:
for data in r.iter_content():
f.write(data)
finally:
if r is not None:
r.close()

.torrent files downloaded using python urllib2 fail to open in bittorrent client

I'm using this code to download .torrent files:
torrent = urllib2.urlopen(torrent URL, timeout = 30)
output = open('mytorrent.torrent', 'wb')
output.write(torrent.read())
The resultant mytorrent.torrent file doesn't open in any bittorrent client and throws up "unable to parse meta file" error. The problem apparently is that although the torrent URL (e.g. http://torcache.com/torrent-file-1.torrent) ends with a '.torrent' suffix, it is compressed using gzip and needs to be uncompressed and then saved as a torrent file. I've confirmed this by unzipping the file in terminal:gunzip mytorrent.torrent > test.torrent and opening the file in the bittorrent client which opens fine.
How do I modify python to look up the file encoding and figure out how the file is compressed and use the right tool to uncompress it and save as a .torrent file?

gzip'ed data must be unziped. You can deteted this, if you look out for the content-encoding header.
import gzip, urllib2, StringIO
req = urllib2.Request(url)
opener = urllib2.build_opener()
response = opener.open(req)
data = response.read()
if response.info()['content-encoding'] == 'gzip':
gzipper = gzip.GzipFile(StringIO(fileobj=data))
plain = gzipper.read()
data = plain
output.write(data)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download csv file using python 3 - python

Related

How to save image which sent via flask send_file

Immediate write JSON API response to file with Python Requests

Python getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 10: invalid start byte" [duplicate]

Python: Download CSV file, check return code?

.torrent files downloaded using python urllib2 fail to open in bittorrent client

Categories

Resources