Why I can't download all video with Python? - python

I have a video in a url, that I want download it using Python.
The problem here is that when I execute the script and download it, the final file just have 1 kb, it's like never start the process of download.
I tried with this solution that I saw in https://stackoverflow.com/a/16696317/5280246:
url_video = "https://abiu-tree.fruithosted.net/dash/m/cdtsqmlbpkbmmddq~1504839971~190.52.0.0~w7tv1per/init-a1.mp4"
rsp = requests.get(url_video, stream=True)
print("Downloading video...")
with open("video_test_10.mp4",'wb') as outfile:
for chunk in rsp.iter_content(chunk_size=1024):
if chunk:
outfile.write(chunk)
rsp.close()
Too I tried like this:
url_video = "https://abiu-tree.fruithosted.net/dash/m/cdtsqmlbpkbmmddq~1504839971~190.52.0.0~w7tv1per/init-a1.mp4"
rsp = requests.get(url_video)
with open("out.mp4",'wb') as f:
f.write(rsp.content)
I tried too with:
urllib.request.retrieve(url_video, "out.mp4")

Related

Heroku cant download file using python

I need the bot to download the file from the link. I am using this function:
def download(url, filename):
get_response = requests.get(url,stream=True)
file_name = filename
with open(file_name, 'wb') as f:
for chunk in get_response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
And then I send it to the user using this:
with open(music, 'rb') as music_file:
await bot.send_audio(message.chat.id, music_file)
Everything works great on my computer. But when I run the bot on Heroku, it constantly gives this error:
aiogram.utils.exceptions.BadRequest: File must be non-empty
I tried to add delays, tried to check if the file was downloaded, but nothing helped. Can anyone help?

Requests download with retries creates large corrupt zip

relative beginner here. I'm trying to complete a basic task with Requests, downloading zip files. It works fine on most downloads, but intermittently writes over-sized, corrupt zip files when working with large downloads (>5 GB or so). For example, there is a zip file I know to be ~11 GB that shows up anywhere between 16 and 20 GB, corrupted.
When unzipping in Windows Explorer, I get "The compressed (zipped) folder is invalid". 7-Zip will extract the archive, but says:
Headers Error --- Unconfirmed start of archive --- Warnings: There are some data after the end of the payload data
Interestingly, the 7-Zip dialog shows the correct file size as 11479 MB.
Here's my code:
save_dir = Path(f"{dirName}/{item_type}/{item_title}.zip")
file_to_resume = save_dir
try:
with requests.get(url, stream=True, timeout=30) as g:
with open(save_dir, 'wb') as sav:
for chunk in g.iter_content(chunk_size=1024*1024):
sav.write(chunk)
except:
attempts = 0
while attempts < 10:
try:
resume_header = {'Range':f'bytes = {Path(file_to_resume).stat().st_size}-'}
with requests.get(url, stream=True, headers=resume_header, timeout=30) as f:
with open(file_to_resume, 'ab') as sav:
for chunk in f.iter_content(chunk_size=1024*1024):
sav.write(chunk)
break
except:
attempts += 1
It appears the server did not support the Range header. Thanks, #RonieMartinez.

Make a request to download a video in Python

I have links of the form:
http://youtubeinmp3.com/fetch/?video=LINK_TO_YOUTUBE_VIDEO_HERE
If you put links of this type in an <a> tag on a webpage, clicking them will download an MP3 of the youtube video at the end of the link. Source is here.
I'd like to mimic this process from the command-line by making post requests (or something of that sort), but I'm not sure how to do it in Python! Can I get any advice, please, or is this more difficult than I'm making it out to be?
As Mark Ma mentioned, you can get it done without leaving the standard library by utilizing urllib2. I like to use Requests, so I cooked this up:
import os
import requests
dump_directory = os.path.join(os.getcwd(), 'mp3')
os.makedirs(dump_directory, exist_ok=True)
def dump_mp3_for(resource):
payload = {
'api': 'advanced',
'format': 'JSON',
'video': resource
}
initial_request = requests.get('http://youtubeinmp3.com/fetch/', params=payload)
if initial_request.status_code == 200: # good to go
download_mp3_at(initial_request)
def download_mp3_at(initial_request):
j = initial_request.json()
filename = '{0}.mp3'.format(j['title'])
r = requests.get(j['link'], stream=True)
with open(os.path.join(dump_directory, filename), 'wb') as f:
print('Dumping "{0}"...'.format(filename))
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
It's then trivial to iterate over a list of YouTube video links and pass them into dump_mp3_for() one-by-one.
for video in ['http://www.youtube.com/watch?v=i62Zjga8JOM']:
dump_mp3_for(video)
In its API Doc, it provides one version of URL which returns download link as JSON: http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM
Ok Then we can use urllib2 to call the API and fetch API result, then unserialize with json.loads(), and download mp3 file using urllib2 again.
import urllib2
import json
r = urllib2.urlopen('http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM')
content = r.read()
# extract download link
download_url = json.loads(content)['link']
download_content = urllib2.urlopen(download_url).read()
# save downloaded content to file
f = open('test.mp3', 'wb')
f.write(download_content)
f.close()
Notice the file should be opened using mode 'wb', otherwise the mp3 file cannot be played correctly.
If the file is big, downloading will be a time-consuming progress. And here is a post describes how to display downloading progress in GUI (PySide)
If you want to download video or just the audio from YouTube you can use this module pytube it does all the hard work.
You can also list the audio only:
from pytube import YouTube
# initialize a YouTube object by the url
yt = YouTube("YOUTUBE_URL")
# that will get all audio files available
audio_list = yt.streams.filter(only_audio=True).all()
print(audio_list)
And then download it:
# that will download the file to current working directory
yt.streams.filter(only_audio=True)[0].download()
Complete Code:
from pytube import YouTube
yt = YouTube ("YOUTUBE_URL")
audio = yt.streams.filter(only_audio=True).first()
audio.download()

Python: Download CSV file, check return code?

I am downloading multiple CSV files from a website using Python. I would like to be able to check the response code on each request.
I know how to download the file using wget, but not how to check the response code:
os.system('wget http://example.com/test.csv')
I've seen a lot of people suggesting using requests, but I'm not sure that's quite right for my use case of saving CSV files.
r = request.get('http://example.com/test.csv')
r.status_code # 200
# Pipe response into a CSV file... hm, seems messy?
What's the neatest way to do this?
You can use the stream argument - along with iter_content() it's possible to stream the response contents right into a file (docs):
import requests
r = None
try:
r = requests.get('http://example.com/test.csv', stream=True)
with open('test.csv', 'w') as f:
for data in r.iter_content():
f.write(data)
finally:
if r is not None:
r.close()

Python image scraper works when run alone but not when called from somewhere else

Python 2.7
Ubuntu 12.04
I'm trying to put together an image scraper, I've done this before with no problems but right now I'm stuck at a certain point.
I get a list of image links from wherever, either a web page or a user, let's assume that they are valid links.
The site I am scraping is imgur, some of the links won't work because I haven't added support for them (single files), I have the code for getting the links for each image from an album page down, that works and returns links like:
http://i.imgur.com/5329B8H.jpg #(intentionally broken link)
The image_download function as I have it in my actual program:
def image_download(self, links):
for each in links:
url = each
name = each.split('/')[-1]
r = requests.get(url, stream=True)
with open(name, 'wb') as f:
for chunk in r.iter_content(1024):
if not chunk:
break
f.write(chunk)
The image_download function as I have it for testing to be run on it's own:
def down():
links = ['link-1', 'link-2']
for each in links:
name = each.split('/')[-1]
r = requests.get(each, stream=True)
with open(name, 'wb') as f:
for chunk in r.iter_content(1024):
if not chunk:
break
f.write(chunk)
Now here's the thing, the second one works.
They both take the same input, they both do the same thing.
The first one does return a file with the correct name and extension but the file-size is different, say 960b as opposed to the second one which returns a file of about 200kb.
When I print the request both return a response of 200, I've tried printing the output at different points and as far as I can see they operate in exactly the same way with exactly the same data, they just don't give back the same information.
What is going on here?
You need to indent f.write(chunk) one more time. You are only writing the last chunk to the file right now.
The corrected function will look like this:
def image_download(self, links):
for each in links:
url = each
name = each.split('/')[-1]
r = requests.get(url, stream=True)
with open(name, 'wb') as f:
for chunk in r.iter_content(1024):
if not chunk:
break
f.write(chunk) #This has been indented to be in the for loop.

Categories