I was making a image downloading project for a website, but I encountered some strange behavior using tqdm. In the code below I included two options for making the tqdm progress bar. In option one I did not passed the iteratable content from response into the tqdm directly, while the second option I did. Although the code looks similar, the result is strangely different.
This is what the progress bar's result looks like using Option 1
This is what the progress bar's result looks like using Option 2
Option one is the result I desire but I just couldn't find an explanation for the behavior of using Option 2. Can anyone help me explain this behavior?
import requests
from tqdm import tqdm
import os
# Folder to store in
default_path = "D:\\Downloads"
def download_image(url):
"""
This function will download the given url's image with proper filename labeling
If a path is not provided the image will be downloaded to the Downloads folder
"""
# Establish a Session with cookies
s = requests.Session()
# Fix for pixiv's request you have to add referer in order to download images
response = s.get(url, headers={'User-Agent': 'Mozilla/5.0',
'referer': 'https://www.pixiv.net/'}, stream=True)
file_name = url.split("/")[-1] # Retrieve the file name of the link
together = os.path.join(default_path, file_name) # Join together path with the file_name. Where to store the file
file_size = int(response.headers["Content-Length"]) # Get the total byte size of the file
chunk_size = 1024 # Consuming in 1024 byte per chunk
# Option 1
progress = tqdm(total=file_size, unit='B', unit_scale=True, desc="Downloading {file}".format(file=file_name))
# Open the file destination and write in binary mode
with open(together, "wb") as f:
# Loop through each of the chunks in response in chunk_size and update the progres by calling update using
# len(chunk) not chunk_size
for chunk in response.iter_content(chunk_size):
f.write(chunk)
progress.update(len(chunk))
# Option 2
"""progress = tqdm(response.iter_content(chunk_size),total=file_size, unit='B', unit_scale=True, desc="Downloading {file}".format(file = file_name))
with open(together, "wb") as f:
for chunk in progress:
progress.update(len(chunk))
f.write(chunk)
# Close the tqdm object and file object as good practice
"""
progress.close()
f.close()
if __name__ == "__main__":
download_image("Image Link")
Looks like an existing bug with tqdm. https://github.com/tqdm/tqdm/issues/766
Option 1:
Provides tqdm the total size
On each iteration, update progress. Expect the progress bar to keep moving.
Works fine.
Option 2:
Provides tqdm the total size along with a generator function that tracks the progress.
On each iteration, it should automatically get the update from generator and push the progress bar.
However, you also call progress.update manually, which should not be the case.
Instead let the generator do the job.
But this doesn't work either, and the issue is already reported.
Suggestion on Option 1:
To avoid closing streams manually, you can enclose them inside with statement. Same applies to tqdm as well.
# Open the file destination and write in binary mode
with tqdm(total=file_size,
unit='B',
unit_scale=True,
desc="Downloading {file}".format(file=file_name)
) as progress, open(file_name, "wb") as f:
# Loop through each of the chunks in response in chunk_size and update the progres by calling update using
# len(chunk) not chunk_size
for chunk in response.iter_content(chunk_size):
progress.update(len(chunk))
f.write(chunk)
Related
I have created a program that downloads mp4 videos from a link. I've added the code below
chunk_size = 256
URL = 'https://gogodownload.net/download.php?url=aHR0cHM6LyAdrefsdsdfwerFrefdsfrersfdsrfer363435349URASDGHUSRFSJGYfdsffsderFStewthsfSFtrftesdfseWVpYnU0bmM3LmdvY2RuYW5pLmNvbS91c2VyMTM0Mi9lYzBiNzk3NmM1M2Q4YmY5MDU2YTYwNjdmMGY3ZTA3Ny9FUC4xLnYwLjM2MHAubXA0P3Rva2VuPW8wVnNiR3J6ZXNWaVA0UkljRXBvc2cmZXhwaXJlcz0xNjcxOTkzNzg4JmlkPTE5MzU1Nw=='
x = requests.head(URL)
y = requests.head(x.headers['Location'])
file_size = int(y.headers['content-length'])
response = requests.get(URL, stream=True)
with open('video.mp4', 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
f.write(chunk)
This code works properly and downloads the video but I want to add a live progress bar. I Tried using alive-progress (code added below) but it didnt work properly.
def compute():
response = requests.get(URL, stream=True)
with open('video.mp4', 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
f.write(chunk)
yield 256
with alive_bar(file_size) as bar:
for i in compute():
bar()
This is the response I got, however the file downloaded properly
Any help?
This is the code you should try. I updated your chunk_size property to match the file size, which I also converted into KB (Kilo Bytes). It shows the values properly alongside the percentage as well
As for the weird boxes showing up on your terminal, that is because your terminal does not support the encoding used by the library. Maybe try editing your terminal settings to support "UTF-8" or higher
file_size = int(int(y.headers['content-length']) / 1024)
chunk_size = 1024
def compute():
response = requests.get(URL, stream=True)
with open('video.mp4', 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
f.write(chunk)
yield 1024
with alive_bar(file_size) as bar:
for i in compute():
bar()
Most probably this is happening because your terminal does not support the encoding which is being printed. Edit the settings of your terminal app to support UTF-8 encoding or higher (preferably, UTF-16 or UTF-32).
Also give tqdm a try. It is a more reknowned library so you know it has been tried and tested.
I'm trying to download some pictures from a website. When I download them with the browser they are much smaller than the one, downloaded with my code.They have the same resolution as the one downloaded with my code, but the difference between the filesizes is very large!
def download(url, pathname):
"""
Downloads a file given an URL and puts it in the folder `pathname`
"""
# if path doesn't exist, make that path dir
if not os.path.isdir(pathname):
os.makedirs(pathname)
# download the body of response by chunk, not immediately
response = requests.get(url, stream=True)
# get the total file size
file_size = int(response.headers.get("Content-Length", 0))
# get the file name
filename = os.path.join(pathname, url.split("/")[-1])
# progress bar, changing the unit to bytes instead of iteration (default by tqdm)
progress = tqdm(response.iter_content(1024), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True,
unit_divisor=1024, disable=True)
with open(filename, "wb") as f:
for data in progress.iterable:
# write data read to the file
f.write(requests.get(url).content)
# update the progress bar manually
progress.update(len(data))
Example: https://wallpaper-mania.com/wp-content/uploads/2018/09/High_resolution_wallpaper_background_ID_77700030695.jpg
Browser: About 222 KB
Code: 48,4 MB
How does this difference come about? How can I improve the code to download images in way, they are smaller?
f.write(requests.get(url).content)
It looks like you're re-downloading the entire file for every 1024 byte chunk, so you're getting 222 copies of the image. Make that:
f.write(data)
for i, pokemon in enumerate(pokemon):
pokeurl = f"https://img.pokemondb.net/sprites/bank/normal/{pokemon}.png"
r = requests.get(pokeurl, stream=True)
open('pokemon.png', 'wb').write(r.content)
#do_stuff
Hi, I'm Kind of new to python. Here for the first time the function is called, the images are saved (6 times, the size of pokemon). But then the second time I'm calling the same function, it saves a corrupted image.
I'd imagine its down to how you open the file.
As written in the docs
Calling f.write() without using the with keyword or calling f.close() might result in the arguments of f.write() not being completely written to the disk, even if the program exits successfully.
So just use a context manager
with open('pokemon.png', 'wb') as f:
f.write(r.content)
Your code works fine after using an unique name to save the images locally:
import requests
import os
pokemons = [
"glaceon", "aerodactyl", "charizard", "blastoise", "greninja", "haxorus",
"flareon", "pikachu"
]
for i, pokemon in enumerate(pokemons):
pokeurl = f"https://img.pokemondb.net/sprites/bank/normal/{pokemon}.png"
r = requests.get(pokeurl, stream=True)
open(f'/tmp/p/{pokemon}.png', 'wb').write(r.content)
# verify images saved
print(os.listdir('/tmp/p'))
Out:
['charizard.png', 'flareon.png', 'pikachu.png', 'haxorus.png', 'blastoise.png', 'aerodactyl.png', 'greninja.png', 'glaceon.png']
I have a console script which uses ftplib as a backend to get a number of files from an ftp server. I would like to use tqdm to give the user some feedback provided they have a "verbose" switch on. This must be optional as some users might use the script without tty access.
The ftplib's retrbinary method takes a callback so it should be possible to hook tqdm in there somehow. However, I have no idea what this callback would look like.
From FTP.retrbinary:
The callback function is called for each block of data received, with a single string argument giving the data block.
So the callback could be something like:
with open(filename, 'wb') as fd:
total = ftpclient.size(filename)
with tqdm(total=total) as pbar:
def callback_(data):
l = len(data)
pbar.update(l)
fd.write(data)
ftpclient.retrbinary('RETR {}'.format(filename), callback_)
Beware: This code is untested and probably has to be adapted.
That code shouldn't work as pbar will be "closed" when the with block terminates, which occurs just before ftpclient.retrbinary(...). You need a very minor indentation mod:
with open(filename, 'wb') as fd:
total = ftpclient.size(filename)
with tqdm(total=total,
unit='B', unit_scale=True, unit_divisor=1024,
disable=not verbose) as pbar:
def cb(data):
pbar.update(len(data))
fd.write(data)
ftpclient.retrbinary('RETR {}'.format(filename), cb)
EDIT added disable flag and bytes scaling
with open(filename, 'wb') as fd:
total = ftpclient.size(filename)
with tqdm(total=total,
unit='B', unit_scale=True, unit_divisor=1024,
disable=not verbose) as pbar:
def cb(data):
pbar.update(len(data))
fd.write(data)
ftpclient.retrbinary('RETR {}'.format(filename), cb)
I am new to Python. Here is my environment setup:
I have Anaconda 3 ( Python 3). I would like to be able to download an CSV file from the website:
https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD
I would like to use the requests library. I would appreciate anyhelp in figuring our how I can use the requests library in downloading the CSV file to the local directory on my machine
It is recommended to download data as stream, and flush it into the target or intermediate local file.
import requests
def download_file(url, output_file, compressed=True):
"""
compressed: enable response compression support
"""
# NOTE the stream=True parameter. It enable a more optimized and buffer support for data loading.
headers = {}
if compressed:
headers["Accept-Encoding"] = "gzip"
r = requests.get(url, headers=headers, stream=True)
with open(output_file, 'wb') as f: #open as block write.
for chunk in r.iter_content(chunk_size=4096):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush() #Afterall, force data flush into output file (optional)
return output_file
Considering original post:
remote_csv = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
local_output_file = "test.csv"
download_file(remote_csv, local_output_file)
#Check file content, just for test purposes:
print(open(local_output_file).read())
Base code was extracted from this post: https://stackoverflow.com/a/16696317/176765
Here, you can have more detailed information about body stream usage with requests lib:
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow